NLP-enhanced Content Filtering Within the POESIA Project

نویسندگان

  • Mark Hepple
  • Neil Ireson
  • Paolo Allegrini
  • Simone Marchi
  • Simonetta Montemagni
  • José María Gómez Hidalgo
چکیده

This paper introduces the POESIA internet filtering system, which is open-source, and which combines standard filtering methods, such as positive/negative URL lists, with more advanced techniques, such as image processing and NLP-enhanced text filtering. The description here focusses on components providing textual content filtering for three European languages (English, Italian and Spanish), employing NLP methods to enhance performance. We address also the acquisition of language data needed to develop these filters, and the evaluation of the system and its components.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text filtering at POESIA: a new Internet content filtering tool dor educational environments

Internet provides to the children an easy access to pornography and other harmful materials. In order to improve the effectiveness of existing filters, we present POESIA, a project which objetive is to develop and evaluate an extensible open-source Internet filtering software in educational environments.

متن کامل

Text Categorization for Internet Content Filtering

Text Filtering is one of the most challenging and useful tasks in the Multilingual Information Access field. In a number of filtering applications, Automated Text Categorization of documents plays a key role. In this paper, we present two of that applications (Hermes and POESIA), focused on personalized news delivery and Internet inappropriate content blocking, respectively. We are specifically...

متن کامل

Intelligent E-Commerce with Guiding Agents based on Personalized Interaction Tools

Project COGITO aims at an agent-based interface for B-to-C applications that is not merely re-active to some user request, but pro-active and capable of engaging in a goal-directed conversation with the user, e.g., by taking the initiative to recommend new products. The approach combines content-based filtering, where user profiles are generated based on content features extracted from document...

متن کامل

Feeding OWL: Extracting and Representing the Content of Pathology Reports

This paper reports on an ongoing project that combines NLP with semantic web technologies to support a content-based storage and retrieval of medical pathology reports. We describe the NLP component of the project (a robust parser) and the background knowledge component (a domain ontology represented in OWL), and how they work together during extraction of domain specific information from natur...

متن کامل

Improved Document Representation for Classification Tasks for the Intelligence Community

Research within a larger, multi-faceted risk assessment project for the Intelligence Community (IC) combines Natural Language Processing (NLP) and Machine Learning techniques to detect potentially malicious shifts in the semantic content of information either accessed or produced by insiders within an organization. Our hypothesis is that the use of fewer, more discriminative linguistic features...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004